Building lexical resources for PrincPar, a large coverage parser that generates principled semantic representations
نویسندگان
چکیده
Parsing, one of the more successful areas of Natural Language Processing, has mostly been concerned with syntactic structure. Though uncovering the syntactic structure of sentences is very important, in many applications a meaning representation for the input must be derived as well. We report on PrincPar, a parser that builds full meaning representations. It integrates LCFLEX, a robust parser, with a lexicon and ontology derived from two lexical resources, VerbNet and CoreLex, that represent the semantics of verbs and nouns respectively. We show that these two different lexical resources that focus on verbs and nouns can be successfully integrated. We report parsing results on a corpus of instructional text and assess the coverage of those lexical resources. Our evaluation metric is the number of verb frames that are assigned a correct semantics: 72.2% verb frames are assigned a perfect semantics, and another 10.9% are assigned a partially correct semantics. Our ultimate goal is to develop a (semi)automatic method to derive domain knowledge from instructional text, in the form of linguistically motivated action schemes.
منابع مشابه
Building lexical semantic representations for Natural Language instructions
We report on our work to automatically build a corpus of instructional text annotated with lexical semantics information. We have coupled the parser LCFLEX with a lexicon and ontology derived from two lexical resources, VerbNet for verbs and CoreLex for nouns. We discuss how we built our lexicon and ontology, and the parsing results we obtained.
متن کاملPutting Pieces Together: Combining FrameNet, VerbNet and WordNet for Robust Semantic Parsing
This paper describes our work in integrating three different lexical resources: FrameNet, VerbNet, and WordNet, into a unified, richer knowledge-base, to the end of enabling more robust semantic parsing. The construction of each of these lexical resources has required many years of laborious human effort, and they all have their strengths and shortcomings. By linking them together, we build an ...
متن کاملCustomizing meaning: building domain-specific semantic representations from a generic lexicon
Language input to practical dialogue systems must be transformed into a semantic representation that is customized for use by the back-end domain reasoners. At the same time, we want to keep front-end system components as domain independent as possible for easy portability across multiple domains. We propose a transparent way to achieve domain specificity from a broad-coverage domain-independen...
متن کاملHarmonised large-scale syntactic/semantic lexicons: a European multilingual infrastructure
The paper aims at providing an overview of the situation of Language Resources (LR) in Europe, in particular as emerging from a few European projects regarding the construction of large-scale harmonised resources to be used for many applicative purpose, also of multilingual nature. An important research aspect of the projects is given by the very fact that the large enterprise described is, at ...
متن کاملTowards an environment for the production and the validation of lexical semantic resources
We present the components of a processing chain for the creation, visualization, and validation of lexical resources (formed of terms and relations between terms). The core of the chain is a component for building lexical networks relying on Harris’ distributional hypothesis applied on the syntactic dependencies produced by the French parser FRMG on large corpora. Another important aspect conce...
متن کامل